The Fundamental Problem of Individual Risk Estimation for Health AI

Diving into uncertainty in individual risks

Lasai Barreñada

KU Leuven

L. Wynants

KU Leuven

D. Thomassen

LUMC Leiden

B. Van Calster

KU Leuven

E. Steyerberg

UMC Utrecht

2025-03-25

Introduction

Clinical prediction models and individual risk

Clinical prediction models estimate the probability of an event conditional on a set of predictors: \(P(Y|X)\)

Before implementing they need to be validated:

  • Assess model’s performance in a external dataset
    • Discrimination: AUROC
    • Calibration: Calibration plot/ reliability diagram.
    • Clinical utility: Net Benefit

Individual treatment decisions require informative individual risks

Uncertainty framework

Approximation uncertainty

Instability of the fitted model.

Definition

  • Different training samples will lead to different individual risks
  • It decreases with higher sample size
  • Patients with uncommon covariate pattern have more unstable risks
  • It is quantified by confidence intervals

Simulations parameters

  1. Training data sample: bootstrap with replacement 100 times.
  2. Training data size:
    • N = 400
    • N = 2000
    • N = 10000

Modeling uncertainty

Lack of knowledge about the optimal model

Definition

  • Modeling algorithm
  • Hyperparameter values
  • Predictors to be included and transformations
  • All modeler choices
  • Normally it is not quantified at all



Simulations parameters

  1. Algorithm: Logistic Regression, Random forest and XGB
  2. Handling of continuous predictors (LR):
    • Linear
    • Dichotomize at median or categorize in 4 groups
    • Multivariate fractional polynomials
    • Restricted cubic splines
  3. Variable selection (LR)
    • None
    • Backward elimination \(\alpha = 0.01\) or \(\alpha = 0.20\)
  4. Penalization (LR)
    • No
    • Ridge with \(\lambda\) tuned with AIC
  5. Tree based methods
    • Minimum node size (RF)/ Maximum depth (XGB): 2 or 20
    • Tuning: Yes or no

Applicability uncertainty

Data collection and population variability

Definition

  • Uncertainty due to data differences
    • Different variable definitions
    • Measurement procedures and error vary
    • Missing data handling
  • Uncertainty due to population differences
    • Case-mix differences in different settings
    • Population drift in the same setting
    • Different inclusion and exclusion criteria


Simulations parameters

  1. Training population sample:
    • Leuven, Belgium
    • Malmo, Sweden
    • Rome, Italy
  2. Handling of missing data:
    • Regression imputation
    • Conditional median imputation
    • Missing indicator imputation
  3. Measurement of lesion:
    • Diameter
    • Volume

Experiment

Aim: evaluate different uncertainty categories in ovarian cancer prediction

  1. Validation set of n=100 (fixed from the center in Leuven).

  2. Train models varying the different categories of uncertainty.

  3. Calculate the individual risk for each patient for each model.

  4. Illustrate variability in the individual risks.

Preferred modelling:

  • Logistic regression with restricted cubic splines, no penalization and no variable selection
  • Missing data imputed with regression
  • Lesion measured in diameter and data from Leuven

Preferred modelling (1)

Training sample: 400

Training sample: 10000

Approximation uncertainty (100)

Training sample: 400

Training sample: 10000

Modelling uncertainty (33)

Training sample: 400

Training sample: 10000

Modelling and approximation uncertainty (3300)

Training sample: 400

Training sample: 10000

Applicability (18)

Training sample: 400

Training sample: 10000

Applicability and approximation (1800)

Training sample: 400

Training sample: 10000

All sources of uncertainty (594000)

Training sample: 400

Training sample: 10000

Individual risks range on average 39% with n = 10000.

Conclusion

  • Prediction models that perform well still estimate very uncertain individual risks.
  • Classic uncertainty measurement (CI) is not enough to quantify uncertainty in individual risks.
  • Approximation uncertainty is reduced with sample size but total uncertainty is dominated by modeling and applicability uncertainty.
  • No need to be skeptical about the models, population performance is enough to guarantee using them as a decision strategy.
  • We should be humble when talking about Personalized Medicine
  • No individual risks but risk for individual patients.

What can be done about uncertainty?

  1. Approximation: Use enough sample size.
  2. Modeling: Better education and use of best practices (guidelines)
  3. Data: Avoid retrospective studies and standardize measurements and definitions.
  4. Population: Multicenter studies to asses heterogeneity.

Embrace uncertainty

Further readings

  1. Van Calster, B. et al. Performance evaluation of predictive AI models to support medical decisions: Overview and guidance. Preprint at arXiv (2024).
  2. Altman, D. G. & Royston, P. What do we mean by validating a prognostic model? Statist. Med. 19, 453–473 (2000).
  3. Hüllermeier, E. & Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn 110, 457–506 (2021).
  4. Gruber, C. et al. Sources of Uncertainty in Machine Learning – A Statisticians’ View. Preprint at arXiv (2023).
  5. Riley, R. D. & Collins, G. S. Stability of clinical prediction models developed using statistical or machine learning methods. Biometrical Journal n/a, 2200302 (2023).
  6. Riley, R. D. et al. Clinical prediction models and the multiverse of madness. BMC Medicine 21, 502 (2023).
  7. Tsegaye, B. et al. Larger sample sizes are needed when developing a clinical prediction model using machine learning in oncology: methodological systematic review. Journal of Clinical Epidemiology 180, (2025).
  8. Pate, A. et al. The uncertainty with using risk prediction models for individual decision making: an exemplar cohort study examining the prediction of cardiovascular disease in English primary care. BMC Medicine 17, 134 (2019).
  9. Stern, R. H. Individual Risk. The Journal of Clinical Hypertension 14, 261–264 (2012).